17.4 Extrinsic Methods
259
In summary, the central themes of sequence comparison are; 18 distance functions
appropriate in the absence of natural correspondence of elements; optimum corre-
spondences between sequences; and dynamic programming algorithms (Sect. 17.4.4)
for calculating the distances and optimum correspondences.
17.4.3
Trace, Alignment, and Listing
These are, perhaps, the three most important modes of presentation for the analysis
of differences between sequences. Trace consists of the source sequence above and
the target sequence below, with lines, at most one per element and not crossing each
other, from some elements in the source to some in the target. The lines provide
at least a partial correspondence between source and target. There are two kinds of
matches of a pair: if the connected elements are the same, they are referred to as
an identity or a continuation; if they are different, a substitution. A source element
without a line is referred to as a deletion; a target element as an insertion (the term
indel means either an insertion or a deletion). This is illustrated below.
Problem. Construct as many different analyses as possible of the above pair of
sequences using trace.
An alignment or matching consists of, again, the source sequence above and the
target below, forming a two-row matrix. Both rows can be interspersed with null
characters (represented by normal empty set∅, or minus−, or simply a blank)—note that a column of null
characters is not permitted. Deletion has the null character below; a column with
the null character above is a substitution. The absence of normal empty set∅denotes a match; if the
elements are equal it is a continuation, if unequal a substitution:
upper I left parenthesis s Subscript a Baseline comma s Subscript b Baseline right parenthesis equals upper I left parenthesis s Subscript b Baseline comma s Subscript a Baseline right parenthesis equals upper I left parenthesis s Subscript a Baseline right parenthesis minus upper I left parenthesis s Subscript a Baseline vertical bar s Subscript b Baseline right parenthesis equals upper I left parenthesis s Subscript b Baseline right parenthesis minus upper I left parenthesis s Subscript b Baseline vertical bar s Subscript a Baseline right parenthesis period
[ I
N
D
U
S
T
∅
R
∅
Y
∅
I
N
∅
∅
∅
T
E
R
E
S
T
]
Problem. Construct as many different analyses as possible of the above pair of
sequences using alignment.
18 Kruskal (1964), Chap. 1 of Sankoff and Kruskal (1999).